Single-cell analysis reveals context-dependent, cell-level selection of mtDNA

Heteroplasmy occurs when wild-type and mutant mitochondrial DNA (mtDNA) molecules co-exist in single cells1. Heteroplasmy levels change dynamically in development, disease and ageing2,3, but it is unclear whether these shifts are caused by selection or drift, and whether they occur at the level of cells or intracellularly. Here we investigate heteroplasmy dynamics in dividing cells by combining precise mtDNA base editing (DdCBE)4 with a new method, SCI-LITE (single-cell combinatorial indexing leveraged to interrogate targeted expression), which tracks single-cell heteroplasmy with ultra-high throughput. We engineered cells to have synonymous or nonsynonymous complex I mtDNA mutations and found that cell populations in standard culture conditions purge nonsynonymous mtDNA variants, whereas synonymous variants are maintained. This suggests that selection dominates over simple drift in shaping population heteroplasmy. We simultaneously tracked single-cell mtDNA heteroplasmy and ancestry, and found that, although the population heteroplasmy shifts, the heteroplasmy of individual cell lineages remains stable, arguing that selection acts at the level of cell fitness in dividing cells. Using these insights, we show that we can force cells to accumulate high levels of truncating complex I mtDNA heteroplasmy by placing them in environments where loss of biochemical complex I activity has been reported to benefit cell fitness. We conclude that in dividing cells, a given nonsynonymous mtDNA heteroplasmy can be harmful, neutral or even beneficial to cell fitness, but that the ‘sign’ of the effect is wholly dependent on the environment.


Heterotypic and homotypic doublets in SCI-LITE
During sequential split-pool rounds of SCI-LITE, it's possible that two cells will traverse the same path by random chance.This theoretical "collision rate," or "doublet rate," can be estimated based on the total number of barcode combinations and the number of cells in the experiment 22 .The estimated collision rate includes both heterotypic doublets (cells from two different cell lines that traveled together) and homotypic doublets (cells from the same cell line that traveled together).To experimentally assess doublet rates in our "barnyard experiment," in addition to mixing HeLa and 293T cells, we also analyzed each cell line separately, wherein each cell line was barcoded with predefined set of RT barcodes that we then used to unambiguously assign reads to either cell line.
In such an experiment we should not observe heterotypic doublets consisting of mixed HeLa and 293T reads, as both cell lines are physically separated during the initial RT step.We found that the cells in these non-mixed arms of the experiment had no more than 29 UMIs per cell from the other cell type detected, and we set this as our threshold for calling heterotypic doublets in the mixed arm of the experiment.With this threshold, we identified 6 heterotypic doublets in our population of 748 mixed HeLa and 293T cells (a rate of ~0.8%).Based on our 2% theoretical collision rate, we would expect ~1.2% of reads to represent homotypic doublets.Homotypic doublets are of course harder to identify, because the colliding cells have the same allele.Many single-cell analysis pipelines assume that doublet cells should have higher UMI coverage than singlets because they have twice the genetic content, and thus filter out the cells with the highest UMI coverage as probable homotypic doublets.However, when we compared the coverage of our heterotypic doublets to the overall UMI count distribution, we found that they were generally not at the extreme end of the distribution.Hence, a simple UMI count filter would remove many true singlets.Since our expected homotypic doublet rate is so low, we chose not to implement a coverage-based filter to remove doublets in our analysis pipeline.In general, we cannot confidently identify doublets outside of the barnyard SCI-LITE experiment, because in our other experiments all cells are expected to have the same alleles but at variable heteroplasmy levels.Reassuringly, the barnyard experiment shows that our doublet rate is low and consistent with our expected rate of about 2%.

Limitations of the study
We acknowledge several limitations of this study.
First, SCI-LITE is designed for targeted analysis of selected transcripts, and therefore it requires optimization for each new target (e.g., mtDNA heteroplasmy of interest, etc.) analogous to how qPCR is typically optimized for each target.
Second, while we find that SCI-LITE results in generating reliable data supported by bulk heteroplasmy estimates, we nonetheless observed a small degree of crosscontamination between analyzed samples that arise in a few specific ways.First, two cells can take the same path through the pooling-and-splitting steps and manifest as a doublet cell in the data analysis as described above.In all experiments we report here, we were able to minimize the impact of doublets on our experimental results in two ways 1) by using a sufficient diversity of barcodes to attain a theoretical doublet rate of no more than 2%, and 2) assigning specific RT barcode sequences to specific samples, which makes heterotypic doublets far less likely.A second form of cross-contamination is when the same UMI is associated with different mtDNA variants.We observed that such crosscontamination events are more probable when more PCR cycles are used to amplify the libraries.We therefore recommend keeping PCR cycles to minimum while amplifying SCI-LITE libraries.We also developed and applied a computational correction -described in detail in the Methods section -that can help correct this kind of cross-contamination and assign the correct allele to each UMI.A third, very minor source of cross-contamination (often just a single UMI per cell if it is present) is due to ambient RNA, which comes from UMIs generated by RNA molecules that leak into the solution, probably from damaged or dying cells 45 , and that then pass through the same combination of wells in the poolingand-splitting process as a valid cell (Supplementary Fig. 1).
Third, we have used DdCBE to model mtDNA heteroplasmy primarily in immortalized cell lines.While models of mtDNA disease have largely been lacking thus far, the advent of mtDNA base editing now makes it possible to generate primary cell models or animal models to further validate our results.Fourth, our in vivo tumor xenograft model supports the importance of environment-dependent selection of mtDNA heteroplasmy that we observed in cultured cells, but deeper understanding of the mechanism requires further study.In our in vivo tumor model showing that the introduced complex I truncating mtDNA heteroplasmy confers a growth benefit to the tumor we used a thyroid follicular cell line (Nthy-ori 3-1) that has been immortalized with a single copy of the SV40 T antigen plasmid 33 .In recent analysis, missense variants in KMT2A, POLE and CHEK2 were identified in this cell line 34 .
All of these variants have unspecified functional effects, but none of them are known to be drivers of tumorigenesis in Hürtle cell carcinomas 8,46 .Future studies are required to determine whether the introduced complex I mutation is required transiently or long-term in the tumor, and what advantage it confers to the cells for promoting tumor growth.